A phrase-level machine translation approach for disfluency detection using weighted finite state transducers
نویسندگان
چکیده
We propose a novel algorithm to detect disfluency in speech by reformulating the problem as phrase-level statistical machine translation using weighted finite state transducers. We approach the task as translation of noisy speech to clean speech. We simplify our translation framework such that it does not require fertility and alignment models. We tested our model on the Switchboard disfluency-annotated corpus. Using an optimized decoder that is developed for phrase-based translation at IBM, we are able to detect repeats, repairs and filled pauses for more than a thousand sentences in less than a second with encouraging results.
منابع مشابه
A Finite-State Approach to Phrase-Based Statistical Machine Translation
This paper presents a finite-state approach to phrase-based statistical machine translation where a log-linear modelling framework is implemented by means of an on-the-fly composition of weighted finite-state transducers. Moses, a well-known state-of-the-art system, is used as a machine translation reference in order to validate our results by comparison. Experiments on the TED corpus achieve a...
متن کاملHierarchical phrase-based translation with weighted finite state transducers
This dissertation is focused in the Statistical Machine Translation field (SMT), particularly in hierarchical phrase-based translation frameworks. We first study and redesign hierarchical models using several filtering techniques. Hierarchical search spaces are based on automatically extracted translation rules. As originally defined they are too big to handle directly without filtering. In thi...
متن کاملACL 2008 THIRD WORKSHOP ON STATISTICAL MACHINE TRANSLATION http://www.statmt.org European Language Translation with Weighted Finite State Transducers: The CUED MT System for the 2008 ACL Workshop on SMT
We describe the Cambridge University Engineering Department phrase-based statistical machine translation system for SpanishEnglish and French-English translation in the ACL 2008 Third Workshop on Statistical Machine Translation Shared Task. The CUED system follows a generative model of translation and is implemented by composition of component models realised as Weighted Finite State Transducer...
متن کاملThe Johns Hopkins University 2003 Chinese-English machine translation system
We describe a Chinese to English Machine Translation system developed at the Johns Hopkins University for the NIST 2003 MT evaluation. The system is based on a Weighted Finite State Transducer implementation of the alignment template translation model for statistical machine translation. The baseline MT system was trained using 100,000 sentence pairs selected from a static bitext training colle...
متن کاملHierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-<italic>n</italic> Grammars
In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search err...
متن کامل